Term importance, Boolean conjunct training, negative terms, and foreign language retrieval: probabilistic algorithms at TREC-5

نویسندگان

Fredric C. Gey

Aitao Chen

Jianzhang He

Liangjie Xu

Jason Meggs

چکیده

The Berkeley experiments for TREC-5 extend those of TREC-4 in numerous ways. For routing retrieval we experimented with the idea of term importance in three ways -training on Boolean conjuncts of the most important terms, filtering with the most important terms, and, finally, logistic regression on presence or absence of those terms. For ad-hoc retrieval we retained the manual reformulations of the topics and experimented with negative query terms. The ad-hoc retrieval formula originally devised for TREC-2 has proven to be robust, and was used for the TREC-5 ad-hoc retrieval and for our Chinese and Spanish retrieval. Chinese retrieval was accomplished through development of a segmentation algorithm which was used to augment a Chinese dictionary. The manual query run BrklyCH2 achieved a spectacular 97.48 percent recall over the 19 queries evaluated before the conference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relating the new language models of information retrieval to the traditional retrieval models

During the last two years, exciting new approaches to information retrieval were introduced by a number of different research groups that use statistical language models for retrieval. This paper relates the retrieval algorithms suggested by these approaches to widely accepted retrieval algorithms developed within three traditional models of information retrieval: the Boolean model, the vector ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

متن کامل

Relevance Feedback for Best Match Term Weighting Algorithms in Information Retrieval

Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user’s judgements on previously retrieved documents to construct a personalised query or user profile. This paper studies relevance feedback within two probabilistic models of information retrieval: the firs...

متن کامل

JHU/APL at TREC 2002: Experiments in Filtering and Arabic Retrieval

For ranked retrieval, we relied on a statistical language model to compute query/document similarity values. Hiemstra and de Vries describe such a linguistically motivated probabilistic model and explain how it relates to both the Boolean and vector space models [4]. The model has also been cast as a rudimentary Hidden Markov Model [13]. Although the model does not explicitly incorporate invers...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

Term importance, Boolean conjunct training, negative terms, and foreign language retrieval: probabilistic algorithms at TREC-5

نویسندگان

چکیده

منابع مشابه

Relating the new language models of information retrieval to the traditional retrieval models

Improved Skips for Faster Postings List Intersection

Improved Skips for Faster Postings List Intersection

Relevance Feedback for Best Match Term Weighting Algorithms in Information Retrieval

JHU/APL at TREC 2002: Experiments in Filtering and Arabic Retrieval

عنوان ژورنال:

اشتراک گذاری